Near Optimal Dimensionality Reductions That Preserve Volumes
نویسندگان
چکیده
Let P be a set of n points in Euclidean space and let 0< ε< 1. A wellknown result of Johnson and Lindenstrauss states that there is a projection of P onto a subspace of dimension O(ε−2 logn) such that distances change by at most a factor of 1+ ε. We consider an extension of this result. Our goal is to find an analogous dimension reduction where not only pairs but all subsets of at most k points maintain their volume approximately. More precisely, we require that sets of size s ≤ k preserve their volumes within a factor of (1+ ε)s−1. We show that this can be achieved using O(max{ ε ,ε−2 logn}) dimensions. This in particular means that for k = O(logn/ε) we require no more dimensions (asymptotically) than the special case k = 2, handled by Johnson and Lindenstrauss. Our work improves on a result of Magen (that required as many as O(kε−2 logn) dimensions) and is tight up to a factor of O(1/ε). Another outcome of our work is an alternative and greatly simplified proof of the result of Magen showing that all distances between points and affine subspaces spanned by a small number of points are approximately preserved when projecting onto O(kε−2 logn) dimensions.
منابع مشابه
6.1 Dimensionality Reduction
Previously in the course, we have discussed algorithms suited for a large number of data points. This lecture discusses when the dimensionality of the data points becomes large. We denote the data set as x1, x2, . . . , xn ∈ RD for D >> n, and will consider dimensionality reductions f : RD → Rd for d << D. We would like the function f to preserve some properties of the original data set, such a...
متن کاملAn Intelligent Credit Forecasting System Using Supervised Nonlinear Dimensionality Reductions
Kernel classifiers (such as support vector machines) have been successfully applied in numerous areas, and have demonstrated excellent performance. However, due to the high dimensionality and nonlinear distribution of financial input data in credit rating forecasting, finding a suitable low dimensional subspace by nonlinear dimensionality reductions is a key step to improve classifier performan...
متن کاملEuclidean Embeddings that Preserve Volumes
Let P be a set of n points in Euclidean space and let 0 < ε < 1. A well-known result of Johnson and Lindenstrauss states that there is a projection of P onto a subspace of dimension O(ε−2 logn) such that distances change by a factor of 1 + ε at most. We consider an extension of this result. Our goal is to find an analogous dimension reduction where not only pairs, but all subsets of at most k p...
متن کاملA Scalable DBMS for Large Scientific Simulations
Scientiic simulations evolve rather fast. Both the logical organization of the underlying database and the scientist's view of data change rapidly. The underlying DBMS must provide appropriate support for the evolution of scientiic simulations, their rapidly increasing computational intensity, as well as the growing volumes and dimensionality of scientiic data. ADAMS is a dynamic and scalable a...
متن کاملAn approximate dynamic programming framework for modeling global climate policy under decision-dependent uncertainty
Analyses of global climate policy as a sequential decision under uncertainty have been severely restricted by dimensionality and computational burdens. Therefore, they have limited the number of decision stages, discrete actions, or number and type of uncertainties considered. In particular, other formulations have difficulty modeling endogenous or decision-dependent uncertainties, in which the...
متن کامل